---
id: configuration
title: Configuration
description: Configuring Crawlee parameters
---

import ApiLink from '@site/src/components/ApiLink';

&#8203;<ApiLink to="core/class/Configuration">`Configuration`</ApiLink> is a class holding Crawlee configuration parameters. By default, you don't need to set or change any of them, but for certain use cases you might want to do so, e.g. in order to change the default storage directory, or enable verbose error logging, and so on.

There are three ways of changing the configuration parameters:

- adding `crawlee.json` file to your project
- setting environment variables
- using the `Configuration` class

You could also combine all the above, but you should keep in mind, that the precedence for these 3 options is the following:
***`crawlee.json`*** < ***constructor options*** < ***environment variables***.

`crawlee.json` is a baseline. The options provided in the `Configuration` constructor will override the options provided in the JSON. Environment variables will override both.

## `crawlee.json`

The first option you could use for configuring Crawlee is `crawlee.json` file. The only thing you need to do is specify the <ApiLink to="core/interface/ConfigurationOptions">`ConfigurationOptions`</ApiLink> in the file, place the file in the root of your project, and Crawlee will use provided options as global configuration.

```json title="crawlee.json"
{
  "persistStateIntervalMillis": 10000,
  "logLevel": "DEBUG"
}
```

With `crawlee.json` you don't need to do anything else in the code:

```js
import { CheerioCrawler, sleep } from 'crawlee';
// We are not importing nor passing
// the Configuration to the crawler.
// We are not assigning any env vars either.
const crawler = new CheerioCrawler();

crawler.router.addDefaultHandler(async ({ request }) => {
    // for the first request we wait for 5 seconds,
    // and add the second request to the queue
    if (request.url === 'https://www.example.com/1') {
        await sleep(5_000);
        await crawler.addRequests(['https://www.example.com/2'])
    }
    // for the second request we wait for 10 seconds,
    // and abort the run
    if (request.url === 'https://www.example.com/2') {
        await sleep(10_000);
        process.exit(0);
    }
});

await crawler.run(['https://www.example.com/1']);
```

If you run this example (assuming you placed the `crawlee.json` file with `persistStateIntervalMillis` and `logLevel` specified there in the root of your project), you will find the `SDK_CRAWLER_STATISTICS` file in default Key-Value store,
which would show, that there's 1 finished request and crawler runtime was ~10 seconds.
This confirms that the state was persisted after 10 seconds, as it was set in `crawlee.json`.
Besides, you should see `DEBUG` logs in addition to `INFO` ones in your terminal, as `logLevel` was set to `DEBUG` in the `crawlee.json`, meaning Crawlee picked both provided options correctly.

## Environment Variables

Another way of configuring Crawlee is setting environment variables.
The following is a list of the environment variables used by Crawlee that are available to the user.

### Important env vars

The following environment variables have large impact on the way Crawlee works and its behavior
can be changed significantly by setting or unsetting them.

#### `CRAWLEE_STORAGE_DIR`

Defines the path to a local directory where <ApiLink to="core/class/KeyValueStore">`KeyValueStore`</ApiLink>, <ApiLink to="core/class/Dataset">`Dataset`</ApiLink>, and <ApiLink to="core/class/RequestQueue">`RequestQueue`</ApiLink> store their data. By default, it is set to `./storage`.

#### `CRAWLEE_DEFAULT_DATASET_ID`

The default dataset has ID `default`. Setting this environment variable overrides the default dataset ID with the provided value.

#### `CRAWLEE_DEFAULT_KEY_VALUE_STORE_ID`

The default key-value store has ID `default`. Setting this environment variable overrides the default key-value store ID with the provided value.

#### `CRAWLEE_DEFAULT_REQUEST_QUEUE_ID`

The default request queue has ID `default`. Setting this environment variable overrides the default request queue ID with the provided value.

#### `CRAWLEE_PURGE_ON_START`

Storage directories are purged by default. If set to `false` - local storage directories would not be purged automatically at the start of the crawler run or before opening of some storage explicitly (e.g. via `Dataset.open()`). Useful if we're trying e.g. to add more items to dataset with each next run (and keep the previously saved/scraped items).

### Convenience env vars

The next group includes env vars that can help achieve certain goals without having to change
our code, such as temporarily switching log level to DEBUG or enabling verbose logging for errors.

#### `CRAWLEE_HEADLESS`

If set to `1`, web browsers launched by Crawlee will run in the headless mode. We can still override
this setting in the code, e.g. by passing the `headless: true` option to the <ApiLink to="puppeteer-crawler/function/launchPuppeteer">`launchPuppeteer()`</ApiLink> function. By default, the browsers
are launched in headful mode, i.e. with windows.

#### `CRAWLEE_LOG_LEVEL`

Specifies the minimum log level, which can be one of the following values (in order of severity):
`DEBUG`, `INFO`, `WARNING`, `ERROR` and `OFF`. By default, the log level is set to `INFO`,
which means that `DEBUG` messages are not printed to console. See the <ApiLink to="core/class/Log">`utils.log`</ApiLink>
namespace for logging utilities.

#### `CRAWLEE_VERBOSE_LOG`

Enables verbose logging if set to `true`. If not explicitly set to `true` - for errors thrown from inside request handler a warning with only error message will be logged as long as we know the request will be retried. Same applies to some known errors (such as timeout errors). Disabled by default.

#### `CRAWLEE_MEMORY_MBYTES`

Sets the amount of system memory in megabytes to be used by the <ApiLink to="core/class/AutoscaledPool">`AutoscaledPool`</ApiLink>.
It is used to limit the number of concurrently running tasks. By default, the max amount of memory
to be used is set to one quarter of total system memory, i.e. on a system with 8192 MB of memory,
the autoscaling feature will only use up to 2048 MB of memory.

## Configuration class

The last option to adjust Crawlee configuration is to use the <ApiLink to="core/class/Configuration">`Configuration`</ApiLink> class in the code.

### Global Configuration

By default, there is a global singleton instance of `Configuration` class, it is used by the crawlers and some other classes that depend on a configurable behavior. In most cases you don't need to adjust any options there, but if needed - you can get access to it via <ApiLink to="core/class/Configuration#getGlobalConfig">`Configuration.getGlobalConfig()`</ApiLink> function. Now you can easily <ApiLink to="core/class/Configuration#get">`get`</ApiLink> and <ApiLink to="core/class/Configuration#set">`set`</ApiLink> the <ApiLink to="core/interface/ConfigurationOptions">`ConfigurationOptions`</ApiLink>.

```js
import { CheerioCrawler, Configuration, sleep } from 'crawlee';

// Get the global configuration
const config = Configuration.getGlobalConfig();
// Set the 'persistStateIntervalMillis' option
// of global configuration to 10 seconds
config.set('persistStateIntervalMillis', 10_000);

// Note, that we are not passing the configuration to the crawler
// as it's using the global configuration
const crawler = new CheerioCrawler();

crawler.router.addDefaultHandler(async ({ request }) => {
    // For the first request we wait for 5 seconds,
    // and add the second request to the queue
    if (request.url === 'https://www.example.com/1') {
        await sleep(5_000);
        await crawler.addRequests(['https://www.example.com/2'])
    }
    // For the second request we wait for 10 seconds,
    // and abort the run
    if (request.url === 'https://www.example.com/2') {
        await sleep(10_000);
        process.exit(0);
    }
});

await crawler.run(['https://www.example.com/1']);
```

This is pretty much the same example we used for showing `crawlee.json` usage,
but now we're using the global configuration, which is the only difference.
If you run this example - you will find the `SDK_CRAWLER_STATISTICS` file in default Key-Value store as before,
which would show the same number of finishes requests (one) and the same crawler runtime (~10 seconds).
This confirms that provided parameters worked: the state was persisted after 10 seconds, as it was set in the global configuration.

:::note

After running the same example with commented two lines of code related to `Configuration` there will be
no `SDK_CRAWLER_STATISTICS` file stored in the default Key-Value store:
as we did not change the `persistStateIntervalMillis`, Crawlee used the default value of 60 seconds,
and the crawler was forcefully aborted after ~15 seconds of run time before it persisted the state for the first time.

:::

### Custom configuration

Alternatively, you can create a custom configuration. In this case you need to pass it to the class that is going to use it, e.g. to the crawler. Let's adjust the previous example:

```js
import { CheerioCrawler, Configuration, sleep } from 'crawlee';

// Create new configuration
const config = new Configuration({
    // Set the 'persistStateIntervalMillis' option to 10 seconds
    persistStateIntervalMillis: 10_000,
});

// Now we need to pass the configuration to the crawler
const crawler = new CheerioCrawler({}, config);

crawler.router.addDefaultHandler(async ({ request }) => {
    // for the first request we wait for 5 seconds,
    // and add the second request to the queue
    if (request.url === 'https://www.example.com/1') {
        await sleep(5_000);
        await crawler.addRequests(['https://www.example.com/2'])
    }
    // for the second request we wait for 10 seconds,
    // and abort the run
    if (request.url === 'https://www.example.com/2') {
        await sleep(10_000);
        process.exit(0);
    }
});

await crawler.run(['https://www.example.com/1']);
```

If you run this example - it would work exactly the same as before,
with the same `SDK_CRAWLER_STATISTICS` file in default Key-Value store after the run,
showing the same number of finished requests and the same crawler run time.

:::note

If you would not pass the configuration to the crawler, there again will be
no `SDK_CRAWLER_STATISTICS` file stored in the default Key-Value store, this time for a different reason though.
Since we did not pass the configuration to the crawler,
the crawler will use the global configuration, which is using the default `persistStateIntervalMillis`.
So again, the run was aborted before the state was persisted for the first time.

:::
